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Abstract 

Energy efficient real-time task scheduling attracted a lot 
of attention in the past decade. Most of the time, deter- 
ministic execution lengths for tasks were considered, but 
this model fits less and less with the reality, especially with 
the increasing number of multimedia applications. It's why 
a lot of research is starting to consider stochastic models, 
where execution times are only known stochastically. How- 
ever, authors consider that they have a pretty much precise 
knowledge about the properties of the system, especially re- 
garding to the worst case execution time (or worst case ex- 
ecution cycles, WCEC). 

In this work, we try to relax this hypothesis, and assume 
that the WCEC can vary. We propose miscellaneous meth- 
ods to react to such a situation, and give many simulation 
results attesting that with a small effort, we can provide very 
good results, allowing to keep a low deadline miss rate as 
well as an energy consumption similar to clairvoyant algo- 
rithms. 



1 Introduction 
1.1 Motivations 

In the past decade, energy efficient systems have been 
actively explored. With the tremendous increase of the 
number of mobile devices, research in this field is still 
only at its beginning. Moreover, as many of those devices 
run now multimedia applications, real-time stochastic low- 
power systems will have a major role to play in the next 
few years. One of the characteristic of stochastic systems, 
is that their parameters are likely to change in the time. For 
instance, a device decoding a movie will need more time to 
process a sequence in a movie with a lot of color and move- 
ment that a dark and quite sequence. What authors usually 
propose in the literature is to observe continuously the tasks 
execution time, in order to update their knowledge about 
the distribution. And once this distribution seems to be too 



far away from the one which was used by the scheduler, the 
scheduler is updated. But if the worst case execution time 
seems to increase, the system cannot afford to wait for col- 
lecting enough information in order to update the scheduler: 
some actions need to be taken as soon as possible, in order 
to avoid to miss deadline. This is what we want to do in 
this work: propose an efficient strategy to react to WCEC 
variation. 

1.2 Related work 

Energy-efficient real-time task scheduling attracted a lot 
of attention over the past decade. Low-power real-time sys- 
tems with stochastic or unknown duration have been stud- 
ied for several years. The problem has first been considered 
in systems with only one task, or systems in which each 
task gets a fixed amount of time. Gruian 1 5 6 1 or Lorch 
and Smith [8 9| both shown that when intra-task frequency 
change is available, the more efficient way to save energy is 
to increase progressively the speed. 

Solutions using a discrete set of frequencies and taking 
speed change overhead into account have also been pro- 
posed lfT3l H2ll , For inter-task frequency changes, some 
work has been already undertaken. In 1101 . authors consider 
a similar model to the one we consider here, even if this 
model is presented differently. The authors present several 
dynamic power management techniques, with different ag- 
gressiveness level.In [1], authors attempt to allow the man- 
ager to tune this aggressiveness level, while in [12|, they 
propose to adapt automatically this aggressiveness using the 
distribution of the number of cycles for each task. The same 
authors have also proposed a strategy taking the number of 
available speeds into account from the beginning, instead of 
patching algorithms developed for continuous speed proces- 
sors 1111 . In 0, we generalize and uniformize the model 
presented in several of the previous papers, and propose two 
contributions: first, we gave a general sufficient and neces- 
sary condition of schedulability for this model, and second, 
we presented a new approach to adapt a continuous -speed- 
based method to a discrete-speed system. Some multipro- 
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cessor extensions have been considered in |4). 
1.3 Model 

The model considered in this document is the same as 
the one presented in [2| by the same authors. We resume it 
here for the sake of completeness, but the interested reader 
is invited to have a look at the given reference. 

• We have N tasks {Ti,i £ [1, . . . , N]} which run on a 
DVS CPU. They all share the same deadline and pe- 
riod D (also known as the frame), do not have offset 
(periods are then synchronised) and are executed in the 
order 7\, T 2 , • • • , TV- The maximum execution num- 
ber of cycles of Tj is wf, 

• The CPU can run at M frequencies fi < ■■ ■ < fur, 

• We have N scheduling functions Si(t) for i £ 
[1, . . . , N] and t £ [0, Dj. This function means that 
if Ti starts its execution at time t, it will run until its 
end — unless the task is suspended before — at fre- 
quency Si(t), where Si(t) £ {/i, / 2 , /m}- S t (t) 
is then a step function (piece-wise constant function), 
with only M possible values. 

This model generalizes several scheduling strategies pro- 
posed in the literature, such as lfTTl[T2l . Figure [JJ shows an 
example of such scheduling function set. 

A scheduling function can be modelized by a set of 
points (black dots on Figure [TJ, representing the begin- 
ning of the step. | Si | is the number of steps of Si. 
Si[k), k £ {1, . . . , | Si |} is one point, with Si[k].t being 
its time component, and Si[k].f the frequency. Si has then 

the same value Si [k].f in the interval Si [k] .t, Si [k + 1] .t 

(with Si[\ Si | +l].t = oo), and we have 

Si(t) = Si[k].f 

where k = max jj £ {1, . . . , | S t |} S t [j].t < ij 

Notice that finding k can be done in 0(log | Si |) (by 
binary search), and, except in the case of very particular 
models, \ S z \< M. 

Energy and time overhead for frequency changes can 
easily be taken into account in the model. 



Figure 1 Example of scheduling with function Si(t). We 
have 5 tasks Ti, . . . , Tg, running every D. T\ is run at fre- 
quency h = S^h), T 2 at h = S 2 (t 2 ), T 3 at / 4 - S 3 (t 3 ), 
etc 
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sufficient) condition ensures that every job in a frame can 
finish before the end of this frame (if the system is expedi- 
ent and T\ starts at the beginning of the frame): 



Si(t) > 



z i+l ^ t 

where z* = D 



Vi£ [l,...,N],t£ [0,Zi 



(1) 



JV 



We don't take frequency change overheads into account, 
but we have shown in |2) that thoses penalties are easy to 
integrate. 



1.3.1 Schedulability Condition 

The scheduling functions Si(t) can be pretty general, but 
have to respect some constraints in order to ensure the sys- 
tem schedulability and avoid deadline misses. If tasks never 
need more cycles than their worst case execution cycles 
(WCEC), or, in other word, if the knowledge about WCEC 
is correct, we show in [2] that the following (necessary and 



1.3.2 Danger Zone 

In [2 1, we define the concept of Danger Zone. The danger 
zone of a task Ti starts at Zi, where Zi is such that if this 
task is not started immediately, we cannot ensure that this 
task and every subsequent task can all be finished by the 
deadline (assuming WCEC are correctly known). In other 
words, if a task starts in its danger zone, and this task and 
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all the subsequent ones use their WCEC, even at the high- 
est frequency, some tasks will miss the (common) deadline. 
The danger zone of Tj is the range ] Zi , D] , where 

1 N 

Zi = D- — y w H . (2) 

1.3.3 Notation 

In this paper, we will use the following notation: 

PI _ ( mm yeA {y\y > x} if x < max{A} 
' ' 1 maxjg^j?/} otherwise 

This notation generalizes the classical notation for the 
ceiling: \x\ = For the sake of readability, [a;]jrwill 

denote \x]if u f M \, or, in other words, the first frequency 
higher or equal to x, or /m if x is higher than Jm- 

Table [Tjresumes symbols used in this document. 



Table 1 Symbols used in this document. 



Ci 


Effective number of cycles of Tj 


D 


Deadline 


fi 


Available frequencies 


M 


Number of available frequencies 


N 


Number of tasks 


Ki(e) 


argmirifc{P[ci < JC] > 1 — e} 


Si® 


Frequency if 7} starts at t 


Ti 


Task number i 


Wi 


Worst case execution cycles (WCEC) of Ti 




Beginning of Tj danger zone 



1.4 Varying WCEC 

Let us assume that we have a set of functions Sj(-), sup- 
posed the be as efficient as possible, thanks to any algorithm 
such as |2| or ifTTI . Those functions have been computed 
knowing the execution length distribution, as well as the 
worst case execution time iWj. 

It is realistic to consider that the distribution can vary ac- 
cording to the time. As long as wi does not change, there 
is no problem: from time to time, when the current dis- 
tribution is too far away from the distribution used for the 
computation of those functions are rebuild. The com- 
putation of those functions can for instance be done during 
the slack time of any frame, or on another CPU. Meanwhile, 
the "old" functions can still be used without risking dead- 
line violation. In the worst case, deprecated functions can 
make the system consuming more energy than an up-to-date 
set of functions. 



But the WCEC can also vary. If a Wi decreases, again, 
this does not have any consequence on the schedulability. 
If this W{ increases, this means that suddenly, an instance 
of Ti runs over the maximal time it was supposed to run. 
Here is the aim of this work: study miscellaneous possi- 
ble reactions to this situation, allowing to reduce as much 
as possible deadline misses or task killings (which in some 
cases are intrinsically unavoidably), before the system can 
rebuild its functions. 

The strategies we propose here are composed of two 
phases: 

• First, we have to decide what to do with tasks over- 
passing their Wi in the current frame; 

• Then, we have to adapt the function Sj(-) in order to 
guarantee the schedulability of the next frame taking 
into account the new Wi (if other tasks won't be longer 
than expected. Otherwise, again, the schedulability is 
not guaranteed). Of course, we do not completely re- 
build Sj(-) which would take too much time. Instead, 
we change them slightly, improving schedulability, but 
possibly reducing power consumption efficiency. 

In this work, we will assume that we cannot tolerate that 
at time D, some tasks of the current frame are still running. 
This means that we prefere to kill some tasks instead of 
missing deadlines. We need then a mechanism allowing to 
kill a task if still running by some given time that can be 
provided at the beginning of this task execution. 

2 No preemption mechanism 

We first assume that tasks cannot be preempted: tasks 
can then be killed, but if at the end of the frame, it hap- 
pens that there is some time left, the killed task cannot be 
resumed. We consider that a situation becomes problematic 
once a task enters (or does not end before) the danger zone 
of the next task. We have mainly three solutions: 

• We do not want any subsequent task to enter the next 
danger zone. We kill then the running task just before 
the danger zone of the next task; 

• We let the task running without any control. At time 
D, we kill any task that would be still running; 

• An hybrid solution. 

2.1 Kill at danger zone 

In this case, we consider that a task running longer than 
expected is "guilty", and its then killed as soon as it could 
make any task to be not able to meet its deadline. When 
task Ti is started, a timeout is set (and disabled when the 
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task finishes) on Zi+\ (see Equation |2j, the start of T 1+ i 
danger zone (or D if i — N). 

The main disadvantage is that the danger zone is usu- 
ally very pessimistic. There is then a high probability (de- 
pending upon the computation cycles distribution) that the 
system kills T^, but some time is still available after Tjv (ex- 
cept, of course, if i = N). 

Another disadvantage of this technique is that, as the 
next task Tj+i starts at the limit of its danger zone, it will be 
killed as soon as it uses more cycles than uij+i. So killing a 
task "cancels" the laxity of the next one. 

2.2 Kill at D 

Here, we accept a task to enter in the next danger zone. 
Of course, if a task starts during its danger zone, it should 
run as fast as possible, then, 

Si(t) = f M Vf > Zi 

At the beginning of a frame, at timeout is set on D, and if 
a task is still running at this time, this task is killed, and any 
subsequent task is just dropped. The main advantage of this 
technique is that it allows to use the totality of the available 
time. Then the "time overflow" has to be huge before a task 
is killed. 

However, the last task of a frame is always expected to 
run at a speed which allows it to use the totality of the re- 
maining time if it uses its WCEC. Indeed, if the frequency 
cannot be changed in the middle of a task, this is usually the 
speed giving the minimum energy consumption. Then, the 
laxity for the tasks at the beginning of a frame is larger than 
the laxity at the end, which could be considered as unfair. 

Another disadvantage is that if a task is killed or not run, 
the last one does obviously not reach its end, even if it does 
not require more than w^. Then, the larger i, the larger the 
probability to be killed or not run. Again, this is unfair. 

One possibility is to put at the end of the frame tasks that 
are less important, or even optional. 

2.3 Hybrid solutions 

An hybrid solution is to give more space to the last tasks 
but still starting some tasks in their danger zone expecting 
them to run less than their WCEC. We allow any task to 
enter the danger zone of the next task, but the large i, the 
less intrusion is allowed. In a strict solution, Tj would be 
killed at z i+ i. We want to define a set of points {£;} such 
as Ti is killed at with the following properties: 

• z% > z%< we are less strict than just killing at the next 
danger zone; 

• Z{ < z%+i'. when a task is killed, the next one needs to 
have some time left; 



• zat+i = D: Tjy is killed at D if still running (because 
of firm deadlines); 

We propose two methods allowing this: the first one 
allows a task Tj to use a fixed ratio of the time interval 
[zi+i, D], the second one takes the job length distribution 
into account. 

The first one defines then 

Zi — Zi + (D - Zi)5i 

for some < Si < 1. We can see easily that 6=1 corre- 
sponds to "Kill at D", and 6 = at "kill at Danger Zone". 

2.3.1 Percentile approach 

An alternative hybrid solution is to consider that any of the 
subsequent tasks will not need more than its 1 — e percentile. 
If Ki(e) (e [0,Wi]) = argmin K {¥'[c l < K] > 1 — e}, 
(Ki(e) is then the length which is not overpassed by (1 — 
e) x 100% of jobs) where Cj is the effective number of used 
cycles, then we only accept Tj to run up to 

1 I N \ 

In the following, for clarity, m stands for /t, (e) . For in- 
stance, we set e to 5%, and, using the number of cycles 
distribution, we compute Kk such as in 95% of cases, 
uses less than Kk cycles. Then we behave as if Kk was the 
WCEC of T fe for each k > i. 

The tolerance e does not have to be the same for every 
task: we could set e = for some important tasks, and 
£i < Ej if i has an higher priority than j. 

In order to improve the fairness between T/v and other 
tasks, we can also consider a decreasing sequence of e. 

However, this technique could be considered as unfair: 
first tasks have much more laxity than last tasks. A way 
of working around this would be to consider the 1 — e per- 
centile only for some fixed number of tasks, say K, and 
using the following value: 

^ /iam{i+K,N} N \ 

z i+1 = d - — k j+ w A ■ 

JM \ j=i+l 3 =i+K+l J 

3 Available preemption mechanism 

If tasks can be preempted, we can then suspend a task as 
soon as it enters the next danger zone (or later, at and 
resume it: 

• Either after the end or the preemption of Tjy. Of 
course, there is no guaranty that finishes before D, 
but we can assume that in general, it is highly likely; 
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• Or as soon as some slack is available, which means 
that a tasks Tj ends before z i+ i. 

There are in this case several questions: 

• At which speed the resumed task should be run? 

• What if several tasks have been suspended? 

3.1 Frequency for resumed task (resume at the 
end) 

The safest method would be to run a resumed task at 
speed /m, but it will consume a lot of energy. In some 
cases, we might have more information. First, we could 
know the global worst case execution cycle number Wi, 
such as Wi < Wi. In this case, we can choose the lowest 
frequency allowing to run the remaining part of Tj assuming 
it could use Wi in total. Then, if Tj is the only task to run at 
time t, if c» is the number of cycles already consumed, we 
can choose 

\ Wi-g - 
D-t r 

If ju < ^pEf 1 (then / = f M ), this means that even at 
/m, we cannot guarantee to finish the task before D. But 
this does not mean that we are going to miss the deadline, 
because the task can of course require less cycles than Wi. 
In such a situation, if not missing deadline is more impor- 
tant than energy consumption, Jm is the best frequency. 

In some situations, we could for instance assume that 
the WCEC does not vary too abruptly. In other terms, if 
Wi is the current WCEC, a task will never need more that 
Wi ■ (1 + a), for some a > 0, or at least we accept to kill 
a task if this condition is not met. In that case, a similar 
frequency selection can be used: 

_ f wj ■ (1 + a) - c{ 

D-t r 

3.2 Frequency for resumed task (resume at the 
first slack) 

Again, the most conservative way is to resume tasks at 
their maximal speed. This conservative method is of course 
the safest, but consumes more energy. The disadvantage of 
resuming tasks as soon as we have some available slack is 
that we don't really know the available total slack. We only 
know the time we have before the next tasks start, but some 
slack might be available later. So several heuristics might 
be considered: 

• Resume tasks at the current speed; 

• Resume tasks at /m 



• According to the "emergency" (current available slack, 
number of tasks to be run, probability to find some 
slack later, . . . ), choose a frequency. 

3.3 Frequency for other tasks 

If we choose to resume tasks after TV, and if a task 
has been suspended, a first method would be to just "for- 
get" about this task before the end of T N , which means 
choose the frequency for subsequence tasks without taking 
this suspended task into account. But in order to reduce the 
power consumption, the system should select the smallest 
frequency allowing Tn to finish just in time if is uses its 
WCEC. If the execution number of cycles variance is rather 
small, the slack time could be pretty much tiny. Especially 
because if the user has the choice, it is in general better to 
put tasks with smaller variance at the end, because this al- 
lows to finish the last task very close to the deadline, and 
then reduce the idle time, and have a better repartition. 

A solution would be to increase the speed of tasks as 
soon as some tasks are waiting in the "suspended tasks 
queue". The most conservative way is that we stop using 
S function when tasks are suspended, and always use /m- 
A more optimistic way would be to increase the speed that 
S would have chosen, for instance the next available fre- 
quency, or + the number of suspended tasks. 

3.4 Several tasks 

If several tasks have been interrupted and should be re- 
sumed, the situation is slightly more complex, and some 
strategy needs to be defined. Let R be the set of tasks to 
resume. 

First, if we do not have any information about the max- 
imal number of remaining cycles, the safest way is to run 
tasks at /m- We also have to choose in what order tasks 
from R are resumed. The simplest way is to resume them 
in the order given by indices. But this give some fairness 
problems: tasks with a higher index have a higher likely- 
hood to be killed. This could fit the user requirements if 
tasks are sorted according to their priority, but not in case of 
uniform priorities. 

In order to improve the fairness, a simple method con- 
sists in resuming tasks randomly, possibly with some crite- 
ria according to the user needs. 

If we have for any task in R the knowledge of its global 
maximum Wi (with possibly W, = Wi ■ (1 + a)), we can 
then use the frequency 

D-t r - 

Of course, for better energy consumption, this frequency 
has to be recomputed before resuming any task of R, be- 
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cause some previous resumed tasks could have required less 
time than their expected maximum. 

3.4.1 Multiple rounds 

In the previous section, we only consider situations where 
resumed tasks are not suspended later. But we can also con- 
sider that tasks can be interrupted or resumed several times. 
We then have to wonder if we would better to finish the 
maximum number of tasks, or to progress in every task in 
parallel. It might be interesting to consider two families of 
tasks: 

• If a not finished tasks does not give any benefit, the 
time spent in running such tasks is just wasted. In this 
case, it would be probably better to run a task up to its 
completion; 

• If a not finished can give some feedback (for instance 
with lost of quality and/or precision, as it is the case in 
the Imprecise Computation Model [7]), then we could 
try to improve the fairness instead of the number of 
finished tasks. 

In the first case, we want to maximize to number of fin- 
ished tasks. If we do not have any information about the 
remaining time, then any order might be equivalent if there 
are not priority between tasks. If we have some informa- 
tion about the remaining time, then we could heuristically 
resume tasks sorted by then smallest remaining time first. 

In the second case, we could prefer to improve the fair- 
ness by running tasks in parallel. Of course, we are only 
interested by the fairness at time D, then we do not need 
"real" parallelism. For instance, we can allocate to each 
task in R the same amount of time. Then, if Tjy ends at t', 
every task gets (D — t')/ || R || (or any weighted repar- 
tition). If a task reaches its allocated amount of time, it 
is re-suspended. If some tasks needed less than their allo- 
cated time, a second round of resumes can be performed. 
This second round contains necessarily less tasks, as it is 
because some tasks end before their allocated time that we 
still have some left time. Notice that the worst number of 
preemptions occurs when at each round, one task finishes 
earlier than the time it receives. So if r =| R |, the number 
of preemptions is (r— 1) x (r — 2) x ■ • • x 2 = (r — 1) x r/2. 

In order to avoid to have too much switching times, we 
could possibly decide (arbitrarily) to run only one task if the 
remaining time goes bellow some bound. 

3.5 Preemption and intra-task frequency changes 

If a preemption mechanism is available, we can use it 
in order to increase the frequency of a task. For tasks re- 
specting their WCEC, we assume that scheduling functions 
do not give any information allowing to optimally change 



the frequency during execution. So if a task does not need 
more than its WCEC, we do not consider changing its speed 
in this work. 

But if some task does not respect its expected delay, we 
can the interrupt it, and resume it immediately at a higher 
frequency. This allows to be safer, especially if we do not 
have information about the global WCEC. Several policies 
can be considered, the safest being to use Jm as soon as a 
WCEC is not respected. But this strategy is certainly the 
most energy consuming. 

We can also consider to take into account the laxity we 
have, that is, the remaining time before the next danger 
zone. We can for instance increase the frequency as the 
laxity diminishes. 

4 Adaptation of Si ( • ) 

If Ti needed Ci > Wi cycles in the previous frame, then 
Ci might be considered as the new WCEC (the case where 
Ti has been killed after cycles is considered later). Of 
course, it is usually not realistic to rebuild the whole set 
of scheduling functions Si(-) before the next frame. We 
choose then to adapt the scheduling function in order to 
guarantee the schedulability, where the WCECs are now 
= max{ci, Wi}. 

Let first consider that only one task Tj overran its 
WCEC. 

The scheduling has now to guarantee that even if Tj re- 
quires again Cj cycles, it will end before Zj+i. We know 
that Sj(-) guarantees that if Tj requires Wj cycles, it will 
end before Zj+i. So we need to build a set of functions 
S'i(t) such as (see Equation [TJ: 

• If i < j: 

m > p 



(3) 



D-t- ^ 

JM 



N 

E W k 
k=i+l 



knowing that 

Si(t) > 



w, 




• lfi = j: 

S'j(t)> 



Zj+i - t 



knowing that Sj (t) > 



Zj + l 



-t' 
(4) 
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If i > j: 



S[(t) > — knowing that S. t (t) > - . 

Zi+l — t Zi+l — t 

(5) 



For i > j, we can simply take S'^t) = Si(t). The lower 
bound of Si(t) does not depend on tasks running before Tj. 
We will now propose two adaptation methods for i < j. 

4.1 Using Schedulability Condition 

A first adaptation consists in using the property that we 
want a function S'^t) close to Si(t), but respecting the 



schedulability condition > 



We propose 



Zi+i - t 

then to simply use S{ (t) as long as it respects the condition, 
and to use this condition when required. Or, more formally, 



max < Si(t), 



max < Si (t) , 



Zi+i - t 



Zi+i — t 



T 



Then, if a task Tj took Cj > Wj cycles in this frame, we 
make the following changes for the next frame: 



• Functions are changed in the following way: 




Zi+l - 1 



T 



if % < j, 
if i > j, 



Adaptation of Si for i £ [1, . . . , j] can be done using 



Algorithm 4. 1 



4.2 Horizontal shift 

Another adaptation can be done using some properties 
of the function. For i < j, we propose to define S'^t) = 

sdt 



A I 



This corresponds to a left shift of the amount of time 
required by the "new cycles", or (cj — wj). Let us now 
check the schedulability condition of S^ for i < j. 



Algorithm 1 Adaptation of Si (schedulability condition) 



foreachpe {2,...,| S* |} do 
XSi\p].f< 



Si\p).f<- 
ttSi\p-l].f< 



i - Si[p].t 
Co- 



then 



Zi+i - Si[p].t 

Cj 



then 



Zi+i - Si[p].t 
if No frequency available between Si[p].f and 

S,j, !./then 



Si\p].t <- Zi+i 



Si\p-l].f 



else 



S t «- 



Zi+l 



Si\p-l].f 



, \Si\p].f+] 



We have 



> 



D -t 



1 



D-t 



1 

/m 



/ 






-Wj + 


\ 




Wi 






N 


c 


+ E 

fc=i+l 



N 



k=i+l 



which is the condition Q. 

Combining the previous adaptation for i = j and this 
one for i < j, Si can be adapted in the following way: 



s,. t 



/m 
max -j Si (t) , 



Zi+l - 1 



if i < j, 

if i = j, 
if i > j, 



This adaptation can be done in 0(j x M + MlogM) 
(if I Si \< MVi), both numbers being usually very small, 



with Algorithm 4.2 Notice that the logarithmic complexity 
is because of the "ceiling" (we need to find the smallest 
frequency higher than a value). But in practice, we don't 
have to consider all frequencies, because the new Sj[k].f 
is obviously higher than the previous one. And if Cj is not 
too different from Wj, the new frequency will simply be the 
next one, or the next to next one. 
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Algorithm 2 Adaptation of S t (i < j) (horizontal shift) 
foreach p e {1, . . . , | Si |} do 

Si]p].t^max^O,Si\p].t- 



IM 



If several tasks overpass their WCEC in the last frame, 
we can just consider that this last frame is equivalent to as 
much frames as the number of tasks having overpassed their 
WCEC, and in each of those frames, only one task over- 
passed its WCEC. We can then just apply the transformation 
given hereabove successively, once for each task. 

Remark that if a task has been killed after using Cj, it may 
happen that the new WCEC is actually larger that q. So 
we may consider a value higher than a as the new WCEC. 
However, in the next frame, there is a non null probabil- 
ity that Ti still has some laxity after Cj cycles in the next 
frame. This makes that, stochastically certainly, the system 
will converge to the new correct WCEC, possibly after hav- 
ing been killed again. 



z[ + (D- iftSi 
Cj — w 



= Zi- 



M 



+ D 



JM 



IM 



TM 



= Zi-(l-6i)- 



Wj 



M 



The new kill/suspend time is then obtained by subtract- 



ing (1 - 5i)- 



Wj 



f. 



M 



for i < j. 



The percentile approach is a little bit more difficult to 
solve, because this method is based on the task length dis- 
tribution, and we assume that the known distribution is ob- 
solete when we need to adapt S functions, and therefore 
we should not use it anymore. We need then to adapt the 
kill/suspend time according to some heuristic. 

We propose two transformations for Kj (we assume Cj > 
Wj ): 



1. k' 



4.3 Adapting Killing/Suspend Time 



Let Zi be the time at which Tj_i is killed (or suspended). 
According to the relationship between Zi and Zi, z% must 
also be adapted when some WCEC changes. Let z[ (resp. 
z'f) be the danger zone (resp. kill/suspend time) after the 
change. If j is the index of the task increasing its WCEC to 
d, we have 



N 



1 \- 

JM 



2. k'j = K+(Cj - Wj) 

The first adaptation assumes that the whole distribution is 
stretched from [0, Wj\ to [0, Cj\. The second adaptation as- 
sumes that the distribution is shifted upwards with a shift 
of (cj — Wj). We consider the generic general percentile 
approach: 



Zi = D 



1 



\{i+K-l,N} 



N 



f. 



M 



k=i+l 

The first adaptation gives: 



k=i+K 



k=i 



and 



z' 



1 N 



1 



Im ^ /m 



Zi 
Zi 



Cj — Wj 



IM 



if i < j 
otherwise 



Then, the new danger zone can be obtained by subtracting 



C 3 ~ W 3 u • / • 

— when i < j. 



If Zi = Zj + (D — Zj)Sj (which includes "kill/suspend at 
danger zone" and "kill/suspend at D"), we have, if i < j: 



N 



d-^[E^ + ^-i)+ E 

\k=i J k=i+K 



Z; = < 



if j < mj 

£> -^IE«fe+ E w k + (c 3 -w 3 ) 

\k—i k—i+K . 



K 



otherwise 



where mf = min{« + K — 1, N}, or 



z'i = < 



JM J 



otherwise 



M 
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If we consider the second adaptation, we can easily get: 



f. 



M 



4.4 WCEC diminuing 

If the WCEC for some job has not been observed for a 
long time, we could consider to reduce it. Let Cj (< Wj) 
be the "new" worst case execution cycle of Tj. While there 
is no schedulability necessity to adapt the scheduling, we 
could try to take this information into account in order to 
reduce the consumption. Again, we want to have simple 
changes before recomputing correctly scheduling functions. 

Of course, we still need to ensure the schedulability (ac- 
cording to the new WCEC set). For instance, we could use 
an adaption very close to the one we did before: 



s, t 



Sj(t): 



Si(t) 



IM 



if i < j, 



ifi 



if i > j, 



In the case i < j, we want to do a right shift. The schedu- 
lability of the case i = j is very easy to show. The schedula- 
bility for other cases can be proved in a very similar way as 
before. Algorithm |4.2| can be used with a slight adaptation. 

5 Simulation Results 
5.1 Scenario 



killed before the system knows the new worst case execu- 
tion times, and does not have to kill jobs any more (unless 
the load is too high and the schedulability cannot be guar- 
anteed anymore). And this number of killed tasks only de- 
pends of the way the transition of the two phases happens, 
and not of the length of those two phases. 

The second scenario uses data collected on experimen- 
tal environment in the Computer Science and Information 
Engineering department of the National Taiwan University, 
Taipei. We consider 9 tasks, and assume that they are dis- 
tributed according to the workload presented in [2|. Again, 
we consider two phases, but much longer: the first phase 
contains 20.000 frames, the second one 4000. See Figure[3] 
We refer this scenario as the DTV workload. 

As the second scenario contains a huge amount of 
frames, the effect of the transition (number of killed job/lost 
cycles before the S-functions are up-to-date) is very small. 
In order to highlight this effect, we also present a short 
version of this scenario, keeping only 250 frames amongst 
20.000. 

For every following figure but Fig. [3] and [2] the hori- 
zontal axis represent the frame length. This means that to 
produce such a plot, we simulated the behaviour of a sys- 
tem with a large frame length (or deadline), measure some 
metric, and start the same experiment with a shorter frame 
length. Large value of frame length corresponds then to a 
low load, and small value to a high load. 

5.2 Fairness 

Measuring the fairness experimentally is not very easy, 
because for most cases, killing a job should be something 
quite rare. We first proposed to measure the laxity of a job 

as: 



In the following simulation results, we will present two 
different scenarios. This first one is very simple and smooth, 
the second one uses data coming from experimental mea- 
surement on video decoding platforms. For the first sce- 
nario, we assume 4 tasks, using a normal distribution for 
execution length. We consider two phases: the first one, for 
which characteristics are known by the system, and which 
contains roughly 160 frames, and the second phase (40 
frames), in which three out of the four tasks change their 
behaviour, and increase their average and worst execution 
time. This 200-frames scenario is of course run several hun- 
dreds of times to obtain usable statistics. Figure [2] shows 
graphically an example of actual job execution number of 
cycles. 

The reason we use such a short scenario (200 frames) 
is that the "critical interval" where jobs are killed is most 
of the time very short: if we consider a scenario close to 
the saturation, only a very few number of jobs need to be 



Ci 



nb instances of TL 



E 



instance k of 



where ei is the number of cycles that instance i actually run, 
and n is the number of cycle that instance i required. The 
lower d, the higher the number of lost cycles. We define 
the fairness as: 

min i {£ J } 



max; {A} 

The drawback of this measurement is that a strategy 
which kills a very few jobs but always the same (for in- 
stance the last one) will have a fairness very close to 1, 
while a strategy which kills more often, but different jobs 
will have higher fairness. However, intuitively, and fair 
strategy should be a strategy which, when jobs are killed, 
each task has a similar probability to be the victim. So we 
propose then 
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Figure 2 First scenario: we consider four tasks, with a normally distributed number of cycles. After approximately 160 
frames, the average task length raises up for three tasks. Vertical axis is the number of cycles, horizontal axis is the frame id, 
or the time. 



Figure 3 Second scenario (DTV workload): we consider nine tasks (we only present four of them here), using a number of 
cycles observed in an experimental environment (see j2j). 




1 ~ nb killed of T, ^ r k 

killed instance k of Tj 

Notice that at low loads, the number of killed jobs is usually 
very small, then the fairness is computed on a small number 
of jobs, which gives pretty much erratic values. 

5.3 No preemption mechanism 

In this first set of simulations, using the workload corre- 
sponding to Figure [2] we assume that we do not have any 
preemption mechanism, and that we are then only allowed 
to kill a job and not to resume it. We show in Figures [4] [5] 
and [5J the influence of the factor 6 — which gives the lax- 
ity we authorise between the danger zone of the next frame, 
and the end of the frame — on the fairness, the killing rate 
and the energy consumption. 

It is not surprising that if no flexibility is given (jobs are 
killed at the next danger zone, S = 0), the fairness (Fig. |4j 
is much better (values close to 1) than if we give a large 
flexibility (5 = 1), because in this later case, last jobs are 
more likely to be killed than first jobs. On the other hand, 
being more "rigid" increases the number of killed jobs at 
the end of any frame, because we waste some free time that 
would be available at the end. So choosing an appropriate 
value of delta should be done carefully, according to the 
kind of workload we consider, but also depending whether 
we have to give more importance to fairness of killing rate. 
In the example we give here, (5 = 0.2 seems to be a good 



Figure 4 Fairness, without preemption mechanism, with 
four tasks (normally distributed length). The closer to 1, 
the fairer. 




0.03 0.025 0.02 0.015 0.01 0.005 
Frame length (Deadline) 



trade-off: the fairness (Fig. |4]i if very close to 8 = 0, but 
we are comparable to S = 1 regarding to the killing rate 
(Fig. B). From the energy point of view (Fig. [5Jl, we don't 
see any significant difference between different <5's. 

Remark that on the right side of the energy plot (Fig.[6j), 
we see a very huge difference between <5's. But at this load, 
a huge number of jobs intrinsically need to be killed, be- 
cause the frame length is no small to allow tasks to finish. 
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Figure 5 Killing rate, without preemption mechanism, with 
four tasks (normally distributed length) 



0.15 




0.03 0.025 0.02 0.015 0.01 
Frame length (Deadline) 



0.005 



Figure 7 Comparison between using or not using pre- 
emption mechanism, with four tasks (normally distributed 
length), and S = 0.2. 
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Figure 6 Energy consumptions, without preemption mech- 
anism, with four tasks, and normally distributed length 
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In Figures [8] |9l and 10 we show respectively the energy 



0.03 0.025 0.02 0.015 0.01 
Frame length (Deadline) 



0.005 



So in most case, designer are not really interested in the 
system behaviour at those loads. 

5.4 Preemption mechanism 

First, we compare the difference between having or not 
a preemption mechanism (Figure FT). Pretty much obvi- 
ously, we observe that the killing rate is lowered when 
preemption/resuming mechanism is available. We show 
for instance a simulation with the normal distribution and 
5 = 0.2. Notice that if we consider high 6, we don't see 
any difference between using or not a preemption mecha- 
nism, because if S is high enough, jobs are almost never 
suspended, and only the last one in the frame is killed. 



consumption, average number of cycles, and fairness for 
various 6 factors, when preemption mechanism is available, 
and using the normal distribution workloads. We can see in 
this set of simulation that there is no need to let jobs running 
in the danger zone: we are better to suspend them as soon as 
they enter the next danger zone ((5 = 0), and resume them 
when some slack time become available. 



Figure 8 Relative energy consumption, with respect to 
(5 = 0, with four tasks (normally distributed length), with 
preemption mechanism (resume at slack) 
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Figure 9 Killing rate, with four tasks (normally distributed 
length), and preemption mechanism (resume at slack) 
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Figure 10 Fairness, with four tasks (normally distributed 
length), and preemption mechanism (resume at slack) 
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5.5 Realistic workload 



0.005 



killed jobs for this method. Indeed, all of those lost cycles 
do not consume energy at all. 



In Figure 12 we show the average killing rate for those 
three scenarios. However, if we consider the same work- 
load as before, we could not see any difference between our 
dynamic method and the "2 phases" method, because our 
adaptation only needs to kill a few jobs, and this number 
of jobs does not depends upon the length of the simulation. 
It's why in this plot, we present measurements done on a 
much shorter workload (± 250 frames). Please notice that 
the killing rate in the "no adaptation" scenario was not af- 
fected significantly by the length of the simulation, as long 
as we keep the same ratio between the first and the second 
phase. We observe in this simulation that the killing rate in 
our adaptation can be very close to zero, which means that 
even if we do not have a good knowledge of the distribu- 
tion (and its worst case execution time), we can still, with 
very small effort, avoid to kill most jobs that would have 
been killed if we needed to collect a new distribution before 
adapting the S'-functions. 

Figure 11 Energy consumption, with 9 tasks (DIV work- 
load). We compare our dynamic adaptation mechanism 
with a clairvoyant method (Two phases), and a simple 
method which does not adapt its information when the 
length raises up. 
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In the few next plots, we will show some results where 
a more realistic workload (see Figure [3} has been used 
for simulations. We aim here at comparing our adaptation 
method to two other scenarios: first, a simple case where 
the scheduling function is not adapted when the workload 
changes, then an idealistic situation where we know in ad- 
vance when the change occurs, and we also know the distri- 
bution of the two phases. 

In the first figure (Figure [TT}, we observe that our adap- 
tation method saves much more energy that no adaptation at 
all. The very low energy consumption we can see for the no 
adaptation scenario at high load comes from the high rate of 



5.6 Miscellaneous experiments 

We have also performed many experiments we do not 
present in details here, by lack of space. Here are a few 
conclusions we have drawn: 

• We almost have not observed any difference between 
"resume at end" and "resume at slack". Slack can be 
used very often, but when slack used, we already en- 
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Figure 12 Killing rate, with 9 tasks (DIV workload). Same 
comparison as for Figur^TT] 
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tered danger zone in most cases, and then often the 
frequency is f max up to the end. 

• The two adaptation methods we proposed (using 
schedulability condition or horizontal shift) do not 
show significant difference, whatever the metric we 
consider. 

• Idem for the adaptations we proposed for the danger 
zone: they both show the same performance or fair- 
ness. 

6 Conclusion 

In this paper, we have shown that with a small effort, we 
can efficiently manage varying WCEC on DVS frame-based 
systems. We provide several algorithms and methods allow- 
ing to first have an efficient behaviour as soon as some task 
overpasses its WCEC, and secondly, adapt the scheduling 
functions to improve the schedulability. We provide sev- 
eral proofs showing the correctness of our algorithms, and 
present many simulation results attesting the performance 
of the proposed methods. Through those simulations, we 
shown that we can be very close to a clairvoyant algorithm, 
both from the killing rate point of view, and from the energy 
consumption point of view. 
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